k-means++ under Approximation Stability

نویسندگان

Manu Agarwal

Ragesh Jaiswal

Arindam Pal

چکیده

The Lloyd’s algorithm, also known as the k-means algorithm, is one of the most popular algorithms for solving the k-means clustering problem in practice. However, it does not give any performance guarantees. This means that there are datasets on which this algorithm can behave very badly. One reason for poor performance on certain datasets is bad initialization. The following simple sampling based seeding algorithm tends to fix this problem: pick the first center randomly from among the given points and then for i ≥ 2, pick a point to be the i center with probability proportional to the squared distance of this point from the previously chosen centers. This algorithm is more popularly known as the k-means++ seeding algorithm and is known to exhibit some nice properties. These have been studied in a number of previous works [AV07, AJM09, ADK09, BR11]. The algorithm tends to perform well when the optimal clusters are separated in some sense. This is because the algorithm gives preference to further away points when picking centers. Ostrovsky et al.[ORSS06] discuss one such separation condition on the data. Jaiswal and Garg [JG12] show that if the dataset satisfies the separation condition of [ORSS06], then the sampling algorithm gives a constant approximation with probability Ω(1/k). Another separation condition that is strictly weaker than [ORSS06] is the approximation stability condition discussed by Balcan et al.[BBG09]. In this work, we show that the sampling algorithm gives a constant approximation with probability Ω(1/k) if the dataset satisfies the separation condition of [BBG09] and the optimal clusters are not too small. We give a negative result for datasets that have small optimal clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thesis Proposal: Approximation Algorithms and New Models for Clustering and Learning

This thesis concerns two fundamental problems in clustering and learning: (a) the k-median and the k-means clustering problems, and (b) the problem of learning under adversarial noise, also known as agnostic learning. For k-median and k-means clustering we design efficient algorithms which provide arbitrarily good approximation guarantees on a wide class of datasets. These are datasets which sa...

متن کامل

A Clustering under Approximation Stability

A common approach to clustering data is to view data objects as points in a metric space, and then to optimize a natural distance-based objective such as the k-median, k-means, or min-sum score. For applications such as clustering proteins by function or clustering images by subject, the implicit hope in taking this approach is that the optimal solution for the chosen objective will closely mat...

متن کامل

Dynamic Controllers Which Use Difference Approximates of Output Derivatives and Their Practical Stability

The derivative feedback is a classical but representative means in the design of control systems, and for practical reasons it is often replaced by its difference approximation. As the resulting closed-loop system involves a time-delay, it does not necessarily preserve stability however accurate the approximation is. Following the terminology of Palmor (1980), it may be said that the practical ...

متن کامل

On the Boundedness Property of Semilinear Sets

14:00 – 15:15 Session 3 Chair: Angsheng Li Modelling the Power Supply Network Hardness and Approximation Alexandru Popa Approximation Algorithms for a combined Facility Location Buy-at-Bulk Network Design Andreas Bley, S. Mehdi Hashemi and Mohsen Rezapour k-means++ under Approximation Stability Manu Agarwal, Ragesh Jaiswal and Arindam Pal 15:15 – 15:35 Coffee Break 15:35 – 15:45 Group Photo 15:...

متن کامل

New three-step iteration process and fixed point approximation in Banach spaces

‎In this paper we propose a new iteration process‎, ‎called the $K^{ast }$ iteration process‎, ‎for approximation of fixed‎ ‎points‎. ‎We show that our iteration process is faster than the existing well-known iteration processes using numerical examples‎. ‎Stability of the $K^{ast‎}‎$ iteration process is also discussed‎. ‎Finally we prove some weak and strong convergence theorems for Suzuki ge...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Theor. Comput. Sci.

دوره 588 شماره

صفحات -

تاریخ انتشار 2013

k-means++ under Approximation Stability

نویسندگان

چکیده

منابع مشابه

Thesis Proposal: Approximation Algorithms and New Models for Clustering and Learning

A Clustering under Approximation Stability

Dynamic Controllers Which Use Difference Approximates of Output Derivatives and Their Practical Stability

On the Boundedness Property of Semilinear Sets

New three-step iteration process and fixed point approximation in Banach spaces

عنوان ژورنال:

اشتراک گذاری